Early detection of variants of concern via funnel plots of regional reproduction numbers

Early detection of the emergence of a new variant of concern (VoC) is essential to develop strategies that contain epidemic outbreaks. For example, knowing in which region a VoC starts spreading enables prompt actions to circumscribe the geographical area where the new variant can spread, by containing it locally. This paper presents ‘funnel plots’ as a statistical process control method that, unlike tools whose purpose is to identify rises of the reproduction number (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{t}$$\end{document}Rt), detects when a regional \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{t}$$\end{document}Rt departs from the national average and thus represents an anomaly. The name of the method refers to the funnel-like shape of the scatter plot that the data take on. Control limits with prescribed false alarm rate are derived from the observation that regional \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{t}$$\end{document}Rt's are normally distributed with variance inversely proportional to the number of infectious cases. The method is validated on public COVID-19 data demonstrating its efficacy in the early detection of SARS-CoV-2 variants in India, South Africa, England, and Italy, as well as of a malfunctioning episode of the diagnostic infrastructure in England, during which the Immensa lab in Wolverhampton gave 43,000 incorrect negative tests relative to South West and West Midlands territories.

www.nature.com/scientificreports/ genomic sequencing methods to a more targeted and purpose-driven framework, to effectively detect potential VoCs and prevent their spread. The keystone of our approach is the use of statistical quality control to monitor the homogeneity of the timevarying estimated reproduction numbers of the disease in different regions of a country (or, more in general, in different geographical areas).
The novelty of the approach consists in statistically comparing the reproduction numbers of different regions in order to detect if some territories behave as outliers. As a key feature of the proposed methodology, a rigorous statistical threshold is derived, which accounts for the different sample sizes, i.e., the number of infectious cases in a region. In the general context of healthcare monitoring, this sample size issue had come under the spotlight in the early 2000, in a series of works [14][15][16] . An example was the detection of abnormal mortality rates in cardiac surgery wards 14 : through the characterization of the baseline variability, one could build control charts with statistical limits which, if exceeded, suggested the existence of an abnormal cause explaining the anomalous mortality. When the key performance indicators were affected by the sample size, it was shown that their monitoring could rely on so-called funnel plots 17,18 .
In the case of epidemics, anomalies can be detected by a comparative monitoring of the regional effective reproduction number, R t , whose variance depends on the number of new infected subjects in the given region. Closely related to R t is the so-called basic reproduction number R 0 (i.e. R t at the beginning of the epidemic outbreak) whose expression is obtained from mathematical models. For the analytical and numerical computation of R 0 for general structured population models, see [19][20][21] .
A large regional R t may have a special cause, such as the emergence of a new VoC, or may just be the effect of statistical fluctuations due to sampling noise. In this work, to monitor the onset of statistical anomalies in regional R t 's, we derive suitable funnel plots whose control limits can reveal abnormal trends, while keeping false alarms under control. We validate our proposed methodology using publicly available epidemiological data from Italy, England, India and South Africa: we show that the crossing of control limits promptly reveals the emergence of new more transmissible variants or the malfunctioning of the diagnostic infrastructure.
In conclusion, we notice that the utility of funnel plots is not limited only to epidemiological setting, but have also meaningful clinical implications. Indeed, several papers 22,23 show that VOCs have a reduced sensitivity to both antiviral drugs and monoclonal antibodies. The capacity to detect VOCs earlier means the possibility to improve the appropriateness of early therapies and to reduce hospitalizations and deaths.

Results
We apply the funnel plot methodology to five case studies, corresponding to different stages of the COVID-19 pandemic, chosen because of their relevance to the spread of VoC's or to flaws of the diagnostic infrastructure. Two case studies refer to England (initial spread of the Omicron variant in December 2021 and large failure of a diagnostic lab in September 2021), and the other three to Italy (initial spread of the Omicron variant in December 2021), India (first emergence of the Delta variant in February 2021), and South Africa (first emergence of the Omicron variant in November 2021). In addition, the nine English regions are monitored over a 18-month period from December 2020 to June 2022.
In all cases, we focus on four key dates. The first date corresponds to a situation of statistical homogeneity: when variants are uniformly spread in the country and contact rates do not vary much across regions, differences between estimated R t 's are exclusively due to natural variability and the regional R t 's are expected to lie within the funnel, centered around the national R t (see Methods). The second and the third dates refer to the disruption of the natural variability: when a new VoC starts spreading, at first it colonizes in particular a few territories, whose behavior becomes abnormal with respect to the national one. This is highlighted by the fact that the corresponding R t 's first cross the funnel limits and then clearly move outside the limits. The last date corresponds to a new homogeneity, typically established around a higher R t : the VoC is now uniformly spread in the country, thus restoring the condition of natural variability. Finally, to have a snapshot of the whole period under study, the standardized R t 's with ± 3.09 sigma are plotted on a Bonferroni control chart, which is a standard univariate control chart whose control limits are adjusted according to the Bonferroni correction (see Methods for details). Due to its statistical background, the scope of the new control method is not restricted to VoC monitoring, but can detect other kinds of anomalies, such as those related to testing availability or malfunctioning buffer factories: we discuss an example of such an anomaly in our Immensa case study.
Hereafter, the infectious cases at day t are the total number of individuals that are infected and infectious at day t, while the new cases at day t are the number of subjects who become infectious at that time.

Spread of the Omicron variant in Italy.
We first apply the funnel plot methodology to the Italian regional data in the period from 4 December to 3 January 2022, based on epidemiological indicators released daily by the Civil Protection Department, which provides 21 regional time series (for 19 regions and the 2 autonomous provinces of Trento and Bolzano). The Delta variant was dominant in Italy until December 2021, when the Omicron variant started to spread across the country.
The results are summarized in Fig. 1. In the Panels a-d, the estimates of Italian regional R t 's are plotted against the infectious cases on four selected dates. On 7 December 2021 (see panel a), differences between estimated R t 's were due to natural variability alone and the 21 points lay within the funnel limits. On 22 December 2021, Lombardia (dark red) crossed the alarm limit (see panel b) and on 24 December 2021 (see panel c) it was definitely outside the upper alarm limit. In fact, as confirmed by a retrospective survey by the Italian National Institute of Health published on 31 December 2021 24  www.nature.com/scientificreports/ well and, by 2 January 2022, Lombardia was absorbed again within a funnel, now with a higher mean than in early December (see panel d).
We can monitor the trend by plotting the standardized R t 's on a Bonferroni control chart with ± 3.09 sigma limits (see panel e), where the arrival of the Omicron variant in Lombardia in mid-December is clearly detectable.  1. Monitoring regional reproduction numbers (R t 's): funnel plots and control chart. Panels a-d show the Italian regional R t 's (colour-coded circles, see Supplementary {1} for the legend), plotted against the infectious cases at four selected times. When the epidemic evolution is homogenous across regions, differences between R t 's are exclusively due to natural statistical variability and the circles are expected to lie inside the black alarm limits in 99.8% of the cases. The alarm limits have the shape of a funnel because the variance of the estimated R t is inversely proportional to the number of infectious cases. The central dashed line represents the average R t . A circle is out of statistical control if it lies outside the black funnel. Out-of-control circles might therefore reveal anomalies that disrupt the homogeneity between regions. In (a-d), the majority of the points, lying in the funnel, are essentially indistinguishable and therefore not even named. On 22 December 2021, Lombardia (dark red) crossed the alarm limit and on 24 December 2021 it was completely outside the upper alarm limit. As confirmed by a survey by the Italian National Institute of Health, Lombardia was the first Italian region to be colonized by the Omicron variant. As the other regions were colonized too, the distribution of their R t 's moved upward and, on 2 January 2022, Lombardia was again inside the funnel. The trend can be monitored by plotting the standardized R t 's on a Bonferroni control chart with ±3.09 (e), where the introduction of the Omicron variant in Lombardia in mid-December is clearly visible.  18 months. In this period, seven major events, labelled from A to G, are visible, whose plausible explanations are conjectured below. A-Alpha variant: although no crossing is observed, the curves form two clusters, and the upper is formed by the three regions where the alpha variant first became dominant. No crossing occurs because, rather than starting in a unique region, the variant colonized three regions at the same time, hindering out-of-control detection. B-Outbreaks in Yorkshire: during Spring 2021, the number of infection cases in the region was slowly decreasing and then it had a huge peak, due to very high numbers of manufacturing jobs and related high-exposure workplaces, leading to outstanding outbreaks such as the one in a Selby warehouse with more than 700 employees. C-Delta variant first arrival in the North West: in May 2021 the Delta variant, originated in India, started its colonization of England from the North West region. As a characteristic feature of the control chart, the curve that first crosses the upper limit in correspondence of a new VoC, later on is often going to cross also the lower limit if its R t is the first one to decrease, as happened to the North West. D-Immensa scandal: see Fig. 4 for a discussion of the massive lab malfunctioning that perturbed case recording in the South West and South East. Compared to the rise of a new VoC, a specular pattern is observed: when the malfunctioning is fixed, a fake outbreak is observed and the red curve crosses the upper control limit. E -Omicron 1 variant first arrival in the London region: see www.nature.com/scientificreports/ malized curves are expected to lie within the limits. Points outside the limits highlight a disruption of the statistical homogeneity across regions, which should be investigated to unveil the root cause of the anomaly. Figure 2 reports seven major events, labelled from A to G, along with plausible conjectured explanations: the emergence or the arrival of the VoCs (Alpha 25,26 , Delta 27 , Omicron 28,29 and Omicron sub-variants 11 ), the malfunction of swab factories (further analyzed in Fig. 4) 30,31 , some incidents of violation of lockdown restrictions 32-34 , and changes in the testing policies 35 .

Emergence of the Delta variant in India.
We applied our methodology to epidemic data from India in the period 13 February-5 March 2021, when the Delta variant emerged and started spreading from the state of Maharashtra. Panels a-d of Fig. 3 show funnel plots at four selected times, where colour-coded circles represent the R t 's of the 36 Indian states. While on February 13 all circles fell within the funnel, on February 16 the state of Maharashtra (dark red) crossed the alert threshold (in correspondence with the initial spread of the Delta variant), further departing from the mean on February 22. Lastly, on 4 March 2021, the R t 's of all regions but Kerala (orange) shaped a new funnel with a higher mean, which again incorporated Maharashtra. The peculiar dropping of Kerala's R t below the lower alert threshold, despite the very high number of infectious cases, might be explained by the co-circulation of Alpha and Delta variants during the same period, resulting in a lower R t than in the areas predominantly hit by the Delta variant.
In the Bonferroni control chart (see panel e), the rise of the Delta variant in Maharashtra is clearly visible since mid-February 2021. One month later, on March 17, it was disclosed that a 10-lab research consortium had alerted the Union Health Ministry about a new variant spreading in Maharashtra 37 , leading to a press release on the new VoC a week later 38 . This case study suggests that the use of statistical control methods would have enabled an earlier detection of the variant.
Emergence of the Omicron variant in South Africa. From 7 November to 4 December 2021, the Omicron variant colonized South Africa, starting with the province of Gauteng. Panels f-i of Fig. 3 show four funnel plots, where colour-coded circles represent the R t 's of the South African provinces. Until the very beginning of November 2021, the Delta variant was prevalent and the differences in R t across provinces merely resulted from natural fluctuations (see panel f). By mid-November the Gauteng province crossed the upper alert threshold (see panel g) and then further diverged (see panel h). This is precisely the timing when the Omicron variant was first identified, as declared by the WHO 39 , and became a threat 40 . By 3 December 2021, Gauteng was reabsorbed within the funnel, now with a much higher mean, following the spread of Omicron in the other provinces and the consequent rise of their R t 's (see panel i). The Bonferroni control chart with ± 3.09 sigma limits (panel j) clearly shows the out-of-control trajectory of the Gauteng province (red). Immensa scandal in England. Our last case study concerns England in the period from 27 August to 25 September 2021. Panels a-d of Fig. 4 display four funnel plots in selected dates, with colour-coded circles corresponding to the R t 's of the English regions. On 5 September 2021, all English regions were within the funnel (panel a). By 9 September 2021, the South West (red) had crossed the lower alarm limit (panel b) and remained below the lower limit for about two weeks (panel e). The timing of this swing coincides with the period during which the Immensa lab in Wolverhampton gave some 43,000 incorrect negative tests relative to South West and West Midlands territories 30,31 . While the suspension of lab operations came in mid-October, the Bonferroni control chart indicated an out-of-control condition already in early September and would have allowed a much earlier detection of the anomaly.

Trajectory plots and funnel movies.
A better understanding of the time evolution of regional R t 's is achieved when complementing funnel plots with the time dimension. This can be done in two ways. The first approach is to display the trajectories of the regional R t 's: an example relative to South Africa is provided in Fig. 5. Alternatively, the funnel plot can be animated, thus yielding a "funnel movie", where both the trajectories and the shapes of the funnels are iteratively updated (see Supplementary {2} for the description and {3} for the movies). As observed in 41 , when R t is plotted against the number of infective subjects, the trajectories exhibit a peculiar clock-wise spiral-shaped pattern. An analogous behavior was observed also in [ 42 , Fig. 2], where trajectories were plotted in the plane of infected cases against "cooperators", a variable connected with R t .
Monitoring regional R t 's. In addition to the Bonferroni control charts, a further monitoring and visualization tool is obtained by plotting the regional R t 's along with the national R t and with control limits defining the in-control band for the specific region. A distinct plot is needed for each region, because the width of the www.nature.com/scientificreports/ in-control band depends on the number of the infectious subjects in that region. Hence, differently from the Bonferroni control chart previously introduced, as many plots are needed as the number of regions. As an advantage, this visualization provides a direct display of the regional R t and may therefore be easier to understand. Examples of these plots for Italy, England, India and South Africa are given in the Supplementary {1}.

Discussion
We proposed funnel plots and the associated Bonferroni control chart as a valuable framework for the early detection of a new emerging or imported VoC and showed their effectiveness in six real-life scenarios based on epidemic data from Italy, India, South Africa and England. These case studies demonstrate that the proposed methodology, besides being direct and inexpensive, allows the early detection of anomalies due to different root causes, ranging from the emergence of a new VoC, and its colonization of a country, to flaws in the diagnostic system, such as the Immensa COVID-19 testing scandal in England. Once the method identifies anomalous patterns, further inquiries are needed to assess their causes. Funnel plots provide an innovative and statistically rigorous tool for monitoring the statistical homogeneity of the distribution of regional R t 's. Our method can be seen as an extension to epidemiology of the funnel charts advocated by Spiegelhalter in the assessment and comparison of institutional performances in the healthcare sector 17 . Before then, funnel plots were mainly known as a standard tool for investigating biases in meta-analysis studies. As such, they have also been employed in the context of COVID-19 meta-analyses, see e.g., 43 .
Prompt identification of a VoC before its large-scale spread, leading to impactful public health implications, is a key goal in the control of the SARS-CoV-2 pandemic and in preventing and controlling future pandemics. However, as the relentless and flashy worldwide dissemination of the Omicron variant has largely proven, some doubts remain about the most effective way to achieve this goal. Although some rRT-PCR-based algorithms and/or NAAT-based screening assays have been proposed for the early identification of VoCs 44,45 and might be implemented in routine laboratories 46 , Whole Genome Sequencing, or at least the complete or partial sequencing of the spike (S) protein-gene, remains the only tool to both effectively identify the different variants and follow the evolution of SARS-CoV-2 47,48 . However, WGS is time consuming, expensive, and needs dedicated structures and personnel with technical expertise to be timely implemented. Furthermore, it is challenging to be applied on low viral loads samples 49 .
Exactly in this breach, the potential support of surveillance based on funnel plots and Bonferroni charts might accelerate the detection of a new VoC, without requiring, at least initially, the backup of a specialized microbiology laboratory. The value of WGS is undisputed, but, in a setting with limited resources, easy and inexpensive data-driven statistical methodologies for surveillance may support more targeted and focused genomic sequencing. Therefore, besides being extremely useful where sequencing is lacking due to scarce resources, the funnel plot framework is also precious to inform and suggest where sequencing efforts should be concentrated. It also allows the detection of anomalies that cannot be revealed by sequencing, such as failures of the testing infrastructure, as shown by the Immensa case study.
The statistical underpinning of the methodology takes into account the natural variability of the phenomenon, thus preventing false alarms even in the presence of noisy data, e.g., due to late registration of new cases. While polished data may be available with weeks of delay, funnel plots can work in real-time using the latest data, a crucial feature to allow an early detection of anomalies and hence prompt interventions. For instance, the Italian funnel plots of the first case study were fed by daily published unprocessed data.
Other authors have proposed the application of statistical process control methods for monitoring the evolution of the COVID-19 pandemic. For instance 50 , proposed hybrid control charts to detect the start and end of exponential growth in reported deaths within a geographic area. An interesting use of hybrid control charts was investigated in 51 , keeping under control exponential and non-exponential growth and decline of cases, disaggregated at the regional and subregional level, to inform local mitigation and containment strategies. Conversely, our approach leverages the characterization of the collective distribution of regional R t 's: we do not monitor each region individually, but rather surveil the homogeneity of the distribution.
In view of its nature, the proposed method reveals the loss of statistical stability, but cannot of course unravel its cause. Consistently with established quality control practices, it should be used to trigger an inspection. Therefore, the funnel plot is not a VoC-detector, but an anomaly detector: early detection enables focused Figure 3. Funnel plots help detect anomalies: spread of the Delta variant in India and of the Omicron variant in South Africa and England. India: (a-d) display the funnel plots at four selected times, with colour-coded circles corresponding to the R t 's of the Indian states. On 13 February 2021, all points are within the funnel, but on 16 February 2021, when the Delta variant starts spreading, there is an out-of-control point corresponding to Maharashtra (dark red), which on 22 February 2021 is further apart from the mean. Finally, on 4 March 2021 the R t 's of all regions except Kerala (orange) converge to a new distribution characterized by a higher R t . The trend can be monitored by plotting the standardized R t 's on a Bonferroni control chart with ±3.09 sigma limits, see Panel e, where the rise of the Delta variant in Maharashtra is clearly visible. South Africa: (f-i) display the funnel plots at four selected times, with colour-coded circles corresponding to the R t 's of the South African provinces. The rise of the Omicron variant in the Gauteng province (red) is well visible both in the funnel plots and in the Bonferroni control chart reported in (j). England: (k-n) display the funnel plots at four selected times, with colour-coded circles corresponding to the R t 's of the English regions. The spread of Omicron in England started from the London region (green), whose R t had already crossed the alarm limit on 10 December, when, as seen in Fig. 3E of 29 , the daily proportion of Omicron infections did not exceed 25%. As the other regions were colonized, the distribution of their R t moved upward and, on 23 December, the London region was again inside the funnel, as also seen in the Bonferroni control chart reported in Panel (o). www.nature.com/scientificreports/ inquiries aimed at discovering the cause of the anomaly. In funnel plots, a point lying outside the funnel limits is associated with high confidence to some anomaly of the effective reproduction number. This may be due to several special causes of variations, such as VoCs, outbreaks due to violations of containment measures, failures of the diagnostic infrastructure (such as the Immensa scandal). In the absence of special causes, all funnel plots in the paper are designed so that all points are inside the alarm limits in 99.8% of the cases (or, equivalently, so that the false alarm probability is 0.2%). (a-d) display the funnel plots at four selected times, with colour-coded circles corresponding to the R t 's of the England regions. On 5 September 2021, all circles were inside the funnel, but on 9 September 2021 there was an out-of-control point below the lower alarm limit corresponding to South West (red), which was further apart from the mean on 14 September 2021, when also West Midlands (brown) went below the lower limit. Finally, on 20 September 2021 the R t 's of all regions returned within the limits. The anomalous decrease of R t in the South West corresponds to the period during which the Immensa lab (Wolverhampton) gave some 43,000 incorrect negative tests relative to South West and West Midlands. The whole trend can be monitored by plotting the standardized R t 's on a Bonferroni control chart with ± 3.09 sigma limits, (e). Lab operations were suspended in mid-October as a consequence of the malfunction, while the control chart indicated an out-of-control condition as early as late August.

Methods
Data. Data regarding new positive cases were obtained from publicly available sources: https:// github. com/ pcm-dpc/ COVID-19/ tree/ master/ dati-regio ni for Italian data, https:// data. covid 19bha rat. org/ for Indian data, https:// media hack. co. za/ datas tories/ coron avirus/ data/# for South African data, https:// coron avirus. data. gov. uk/ detai ls/ downl oad for English data. Following 52 , we assumed a discretized lognormal distribution for the serial interval, with parameters chosen in accordance with 53 . To correct systematic errors in the data, partly due to the weekly periodicity, partly due to delays and other reporting errors, all data were filtered using a double seven-day moving average.

Funnel plots and Bonferroni control charts.
In a funnel plot, a measured or estimated quantity is plotted against an interpretable measure of its precision. A funnel plot is composed of four elements 17 : (i) an indicator Y that represents the quantity to be monitored, (ii) a reference value θ that specifies the expectation of the indicator, (iii) a precision parameter ρ that determines the accuracy with which the indicator is measured, (iv) the control limits y lower , y upper that specify the boundaries of the out-of-control region. An example of funnel plot can be seen in Fig. 1. The dot (ρ i , y i ) is associated with the i-th region, where ρ i is the number of infectious cases in the region and y i is the region's reproduction number R t at a given time t . The horizontal line Y = θ shows the national average R t and the funnel-shaped pair of control limits y lower and y upper shows where we would expect the regions to lie if their R t 's were statistically indistinguishable from one another, see Panel d in Fig. 1.
In several circumstances, an exact or approximate normal distribution of the indicator Y can be assumed where g is a suitable function of θ 17 such that Var[Y ] = g(θ)/ρ . Under this null hypothesis, with probability 1 − α,  In the first date, before the spread of Omicron, the average R t was below 1 and all points were inside the grey funnel plot. Then, Gauteng's R t (red) moved upwards, followed by the other provinces. Overall, Omicron caused an upward escape of the province where it first became dominant (Gauteng, red line), followed by a collective drift of the R t 's of other provinces, until a new funnel, i.e., the black one, was established at a higher level. The trajectories go leftwards when R t is less than one, because the infectious cases tend to decrease, while the trajectories go rightwards when R t is greater than one. Therefore, the trajectories exhibit a characteristic clock-wise trend 41 . Supplementary {2} describes in detail how to obtain an animated representation of this phenomenon by iteratively updating the R t 's and the funnels shape, leading to the "funnel movies" available in Supplementary {3}. www.nature.com/scientificreports/ where z α/2 is such that P Z ≤ z α/2 = 1 − α 2 for a standard normal variable Z . For instance, z α/2 = 1.96 , when α = 5% , and z α/2 = 3.09 , when α = 0.2% . This means that, in 100(1 − α)% of the cases, Y is expected to lie within the lower and upper control limits defined as By introducing the Z-score we have that P(|z i | ≤ z α/2 ) = 1 − α . In Statistical Process Control, the common practice is to select a false alarm probability as small as α = 0.2% , corresponding to z α/2 ≥ 3.09. A Z-score whose absolute value is greater than z α/2 is said to be out of (statistical) control and deemed worthy of study to identify a special cause of variation that explains its departure from the mean. Note that there is a 0.2% probability of reporting an out-of-control point when no special cause of variation is actually perturbing the process and the outlier arises by pure chance under common causes of variation.
When monitoring n units of analysis, e.g., the R t of n regions within a country, due to the multiple comparison problem, the false positive rate could become unacceptably large. A simple way to address this problem is the Bonferroni correction that replaces α with α/n 54 .
When the indicators y i measure a frequency of occurrence, e.g., the mortality rates in heart surgery units, it is reasonable to assume a binomial model, with θ representing the probability of the event and ρ i the number of surgeries in the i-th unit. For the binomial model, the variance of y i is θ(1 − θ)/ρ i so that, given θ , the variance of y i is completely specified. For a large enough ρ , the binomial converges to a normal random variable that follows distribution (1) with g(θ) = θ(1 − θ) . An analogous case is when the products ρ i y i are Poisson distributed with expectation ρ i θ . If ρ i θ is greater than 30, the indicators y i are then normally distributed as (1) with g(θ) = θ. Therefore, for both the ideal binomial and Poisson model, estimating the mean of y i suffices to specify both the centerline and the alarm limits of the funnel plot.
However, as discussed in 18 , if one lets the variance be specified by the mean, it very often happens that the fraction of units of analysis that lie outside the ideal alarm limits greatly exceeds the theoretical false positive rate. This phenomenon, well known in the statistical literature, goes under the name of overdispersion 18 . This can be dealt with by modifying (1) with the introduction of an overdispersion parameter φ to be estimated from data: The control limits and the Z-scores are redefined accordingly as When the indicators to be monitored are time series depending on a time index t , i.e., y i = y i (t) , a distinct funnel plot can be drawn for each time instant. For the purpose of statistical monitoring, the relevant information can be summarized in a Bonferroni control chart where the trends of the Z-scores are plotted in time against Bonferroni limits, see for instance panel e in Fig. 1. Under (2), we have that z i ∼ N[0, 1] , so that, when the Z -scores are plotted on a control chart with zero centerline and Bonferroni limits equal to ±z α/(2n) , the probability of one or more dots lying outside the limits is equal to α. Distribution of regional R t 's. The reproduction number at time t , named R t , captures the number of secondary infections from a population including both susceptible and immune individuals. For its estimation, a range of model frameworks and estimation procedures have been proposed 55 . Herein we adopt the approach of Cori et al. 56 that makes minimal assumptions about the mathematical model of the epidemic process. Cori's formula uses the time series of the new cases and estimates of the distribution of the generation time, i.e., the time between infections.
According to 56 the estimate R t of the instantaneous reproduction number R t is obtained as where I t denotes the daily number of new infected cases and w s are the coefficients, adding up to one, of the infectivity profile, often approximated by the distribution of the serial interval. The denominator y lower = θ − z α/2 g(θ)/ρ y upper = θ + z α/2 g(θ)/ρ (2) Y |θ , ρ ∼ N θ, φg(θ )/ρ y lower = θ − z α/2 φg(θ )/ρ y upper = θ + z α/2 φg(θ )/ρ www.nature.com/scientificreports/ can be interpreted as the total infectiousness of individuals that are currently infected at time t . In view of the typical models of the infectivity profile, e.g., lognormal or gamma density functions, t is a smoothed version of the time series I t of daily new cases. If seven-day moving averages are used to filter out weekly oscillations, I t is already smooth, and the resulting t is insensitive to the precise shape of the infectivity profile. This feature may prove helpful when a new VoC arises whose infectivity profile is unknown, or only approximately known.
To derive the distribution of R t , we only assume that disease transmission follows a Poisson distribution with mean R t t : Typically, R t � > 30 , so that a normal approximation can be used: For the sake of interpretability, rather than using the notion of total infectiousness t , it is more intuitive to refer to the total number of infectious individuals. To this aim, we introduce the parameter i.e., the inverse of the mean serial interval, which, for the well-known SIR model, corresponds to the removal rate 57 . Then, ρ t = � t /γ represents the number of individuals that are infectious at time t . Letting θ = R t , g(θ) = R t /γ , it follows that Comparing the above distribution with (1), it follows that, for any given t , the scatter plot of R t against ρ t is indeed a funnel plot. Also in this case, it is convenient to introduce an overdispersion parameter φ , so that, in accordance with (2), the final model becomes A useful byproduct of introducing overdispersion is that φ takes into account the effect that possible errors or uncertainties in the estimated mean serial interval has on the variance of R t . Indeed, the variance of R t is inversely proportional to γ , but the effect of a wrong γ is automatically compensated when estimating φ from the data. where e = Y − � θ is the vector of the residuals. Recalling that g(θ) = θ/γ , the overdispersion parameter φ is estimated as φ = γ σ 2 / θ . Data winsorization can be performed, as detailed in 17 , to reduce the effect of possibly spurious outliers. www.nature.com/scientificreports/ Estimates θ and φ are needed to compute the funnel at time t , as well as the standardized residuals z i to be plotted in the Bonferroni control chart. The estimate φ is computed from (6) using the set of R i t that were in control at time t − 1 . The centerline θ is obtained by projecting at time t a weighted linear regression estimated from the set of R i t that were in control at times t − 1, t − 2, and t − 3 . The weights are given by the numbers x i of infectious individuals at the same times. This procedure yields an estimate of the current θ that tracks the trends of the national R t , but is still fairly robust thanks to the use of the last three data points. At the beginning (and in the rare cases when all units are out of control), the whole sets R i t , ρ i t are fed to the estimator.

Ethics declarations.
No ethics review and informed consent/animal welfare protocols were needed, because the research entirely relied on epidemiological time series that are publicly accessible.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.